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Abstract 

We prove a long-standing conjecture which characterises the Ewens-Pitman two- 
parameter family of exchangeable random partitions, plus a short list of limit and 
exceptional cases, by the following property: for each n = 2, 3, . . ., if one of n indi- 
viduals is chosen uniformly at random, independently of the random partition ir n of 
these individuals into various types, and all individuals of the same type as the cho- 
sen individual are deleted, then for each r > 0, given that r individuals remain, these 
individuals are partitioned according to n' r for some sequence of random partitions 
(ir' r ) which does not depend on n. An analogous result characterizes the associ- 
ated Poisson-Dirichlet family of random discrete distributions by an independence 
property related to random deletion of a frequency chosen by a size-biased pick. We 
also survey the regenerative properties of members of the two-parameter family, and 
settle a question regarding the explicit arrangement of intervals with lengths given 
by the terms of the Poisson-Dirichlet random sequence into the interval partition 
induced by the range of a homogeneous neutral-to-the right process. 
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1 Introduction 



Kingman |14J introduced the concept of a partition structure, that is a family of probability 
distributions for random partitions n n of a positive integer n, with a sampling consistency 
property as n varies. Kingman's work was motivated by applications in population genet- 
ics, where the partition of n may be the allelic partition generated by randomly sampling a 
set of n individuals from a population of size N ^> n, considered in a large N limit which 
implies sampling consistency. Subsequent authors have established the importance of 
Kingman's theory of partition structures, and representations of these structures in terms 
of exchangeable random partitions and random discrete distributions [21] , in a number of 
other settings, which include the theory of species sampling [22], random trees and asso- 
ciated random processes of fragmentation and coalescence [231 El El 12] , Bayesian statistics 
and machine learning [26l [27] . Kingman [13J showed that the Ewens sampling formula 
from population genetics defines a particular partition structure (n n ), which he character- 
ized by the following property, together with the regularity condition that P(7r n = A) > 
for every partition A of n: 

for each n = 2, 3, . . ., if an individual is chosen uniformly at random inde- 
pendently of a random partitioning of these individuals into various types 
according to n n , and all individuals of the same type as the chosen individual 
are deleted, then conditonally given that the number of remaining individuals 
is r > 0, these individuals are partitioned according to a copy of Tr r . 

We establish here a conjecture of Pitman [20J that if this property is weakened by replacing 
7i> by ir' r for some sequence of random partitions (7f£), and a suitable regularity condition is 
imposed, then (7r n ) belongs to the two-parameter family of partition structures introduced 
in [20] . Theorem [3] below provides a more careful statement. We also present a corollary 
of this result, to characterize the two-parameter family of Poisson-Dirichlet distributions 
by an independence property of a single size-biased pick, thus improving upon [21J. 

Kingman's characterization of the Ewens family of partition structures by deletion 
of a type has been extended in another direction by allowing other deletion algorithms 
but continuing to require that the distribution of the partition structure be preserved. 
The resulting theory of regenerative partition structures [B], is connected to the theory of 
regenerative sets, including Kingman's regenerative phenomenon [12] , on a multiplicative 
scale. In the last section of the paper we review such deletion properties of the two- 
parameter family of partition structures, and offer a new proof of a result of Pitman 
and Winkel [25] regarding the explicit arrangement of intervals with lengths given by the 
terms of the Poisson-Dirichlet random sequence into the interval partition induced by a 
multiplicatively regenerative set. 

2 Partition Structures 

This section briefly reviews Kingman's theory of partition structures, which provides 
the general context of this article. To establish some terminology and notation for use 
throughout the paper, recall that a composition A of a positive integer n is a sequence 
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of positive integers A = (Ai, . . . , A&), with ^i=i ^ = n - Both k = k\ and n = n x may 
be regarded as functions of A. Each term A« is called a part of A. A partition A of n is 
a multiset of positive integers whose sum is n, commonly identified with the composition 
of n obtained by putting its positive integer parts in decreasing order, or with the infinite 
sequence of non-negative integers obtained by appending an infinite string of zeros to this 
composition of n. So 

A = (Ai, A 2 , . . .) with Ai > A 2 > ■ • • > 
represents a partition of n = n\ into k = k\ parts, where 

n\ := Xi and k\ := max{i : A; > 0}. 

i 

Informally, a partition A describes an unordered collection of n\ balls of k\ different colors, 
with Aj balls of the ith most frequent color. A random partition of n is a random variable 
7r n with values in the finite set of all partitions A of n. Kingman [2] defined a partition 
structure to be a sequence of random partitions (7r n ) ngN which is sampling consistent in 
the following sense: 

if a ball is picked uniformly at random and deleted from n balls randomly 
colored according to 7r n , then the random coloring of the remaining n — 1 balls 
is distributed according to 7T n _i. 

As shown by Kingman [15], the theory of partition structures and associated partition- 
valued processes is best developed in terms of random partitions of the set of positive 
integers. Our treatment here follows [20] . If we regard a random partition 7r n of a positive 
integer n as a random coloring of n unordered balls, an associated random partition Il n 
of the set [n] := {1, . . . ,n} may be obtained by placement of the colored balls in a row. 
We will assume for the rest of this introduction that this placement is made by a random 
permutation which given n n is uniformly distributed over all n\ possible orderings of n 
distinct balls. 

Formally, a partition of [n] is a collection of disjoint non-empty blocks {B\, . . . ,B k } 
with U^ =1 Bi = n for some 1 < k < n, where each Bi C [n] represents the set of places 
occupied by balls of some particular color. We adopt the convention that the blocks Bi are 
listed in order of appearance, meaning that Bi is the set of places in the row occupied by 
balls of the zth color to appear. So 1 G B 1 , and if k > 2 the least element of B 2 is the least 
element of [n] \ Bi, if k > 3 the least element of B% is the least element of [n] \ (B\ U B 2 ), 
and so on. This enumeration of blocks identifies each partition of [n] with an ordered 
partition (Bi, . . . , Bk), subject to these constraints. The sizes of parts (\Bi\, . . . ,\Bk\) 
of this partition form a composition of n. The notation Il n = {B\, . . . , Bk) is used to 
signify that Il n = {Bi, . . . , B k } for some particular sequence of blocks (B 1: . . . , B k ) listed 
in order of appearance. If Il„ is derived from 7r n by uniform random placement of balls 
in a row, then Tl n is exchangeable, meaning that its distribution is invariant under every 
deterministic rearrangement of places by a permutation of [n] . Put another way, for each 
partition (B±, . . . , B k ) of [n], with blocks in order of appearance, 

F(U n = (B u ...,B k ))=p(\B 1 \,...,\B k \) (1) 
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for a function p = p(X) of compositions A of n which is a symmetric function of its k argu- 
ments for each 1 < k < n. Then p is called the exchangeable partition probability function 
(EPPF) associated with Il n , or with n n , the partition of n defined by the unordered sizes 
of blocks of Il n . 

As observed by Kingman [T5], (7r n ) is sampling consistent if and only if the sequence 
of partitions (ILJ can be constructed to be consistent in the sense that for m < n the 
restriction of Il n to [m] is U m . This amounts to a simple recursion formula satisfied by 
p, recalled later as (1201) . The sequence II = (Iln) can then be interpreted as a random 
partition of the set N of all positive integers, whose restriction to [n] is IT n for every n. Such 
II consists of a sequence of blocks B±, B2, . . ., which may be identified as random disjoint 
subsets of N, with \J*£L x Bi = N, where the nonempty blocks are arranged by increase of 
their minimal elements, and if the number of nonempty blocks is some K < oo, then by 
convention Bi = for i > K. Similarly, IT n consists of a sequence of blocks B n i '■= BiC\ [n], 
where Uj£> n j = [n], and the nonempty blocks are consistently arranged by increase of their 
minimal elements, for all n. 

These considerations are summarized by the following proposition: 

Proposition 1 (Kingman [15J) The most general partition structure, defined by a 
sampling consistent collection of distributions for partitions ir n of integers n, is associated 
with a unique probability distribution of an exchangeable partition of positive integers 
II = (Il n ) ; as determined by an EPPF p according to ([T]). 

We now recall a form of Kingman's paintbox construction of such an exchangeable 
random partition II of positive integers. Regard the unit interval [0, 1) as a continuous 
spectrum of distinct colors, and suppose given a sequence of random variables (P/, P%, ■ ■ ■) 
called ranked frequencies, subject to the constraints 

oo 

1 > P{ > pt > . . . > o, P* := 1 - ^ Pj > 0- ( 2 ) 

i=i 

The color spectrum is partitioned into a sequence of intervals [U, 7*j) of lengths P/, and in 
case P* > a further interval [1 — P*, 1) of length P*. Each point u of [0, 1) is assigned 
the color c(u) — k if u G [h,ri) for some i = 1,2,..., and c(u) = u if u e [1 — P*, 1). 
This coloring of points of [0, 1), called Kingman's paintbox associated with (P/, Prf , . . .), 
is sampled by an infinite sequence of independent uniform[0, 1] variables C/j, to assign a 

color c(Ui) to the ith ball in a row of balls indexed by i — 1, 2, The associated color 

partition of N is generated by the random equivalence relation ~ defined by % ~ j if and 
only if c(Ui) = c(Uj), meaning that either Ui and Uj fall in the same compartment of the 
paintbox, or that i = j and Ui falls in [1 — P*, 1). 

Theorem 2 (Kingman's paintbox representation of exchangeable partitions [14]) Each 
exchangeable partition IT o/N generates a sequence of ranked frequencies (P/, Prf , . . .) such 
that the conditional distribution of U given these frequencies is that of the color partition 
o/N derived from (P/, Prf , . . .) by Kingman's paintbox construction. The exchangeable par- 
tition probability function p associated with II determines the distribution of (P/, Pj, . . .), 
and vice versa. 
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The distributions of ranked frequencies (P x , P 2 , • • • > ) associated with naturally arising 
partition structures (7r n ) are quite difficult to deal with analytically. See for instance [21] . 
Still, (Pi, ■ ■ ■) can be constructed as the decreasing rearrangement of the frequencies 
Pi of blocks Bi of I! defined as the almost sure limits 

Pi = lim n~ l \B ni \ (3) 

n^oo 

where £ = 1,2,... indexes the blocks in order of appearance, while 

00 00 
P, = 1-^P = 1-^P/ 

i=i i=i 

is the asymptotic frequency of the union of singleton blocks 

B* '■= ^{i:\B t \ = l}Bi, 

so that d2J) holds also for i = *. The frequencies are called proper if P* = a.s.; then almost 
surely every nonempty block Bi of II has a strictly positive frequency, hence \Bi\ = 00, 
while every block Bi with < |£>,| < 00 is a singleton block. 

The ranked frequencies P/, P 2 ^, . . . appear in the sequence (P 3 ) in the order in which 
intervals of these lengths are discovered by a process of uniform random sampling, as in 
Kingman's paintbox construction. If P* > then in addition to the strictly positive terms 
of P^P^... the sequence (Pi) also contains infinitely many zeros which correspond to 
singletons in II. The conditional distribution of (Pj) given (P. ) can also be described in 
terms of iteration of a single size-biased pick, defined as follows. For a sequence of non- 
negative random variables (Xj) with Yli-^i — 1 an d a random index J G {1, 2, . . . , 00}, 
call Xj a size- biased pick from (Xj) if Xj has value Xj if J = j < 00 and Xj = if 
J = 00, with 

P(J = j|(X i ,zeN)) = X j (0<j<oo) (4) 

(see [7] for this and another definition of size-biased pick in the case of improper frequen- 
cies). The sequence derived from (Xj) by deletion of Xj and renormalization refers to the 
sequence (Yi) obtained from (Xj) by first deleting the Jth term Xj, then closing up the 
gap if J 7^ 00, and finally normalizing each term by 1 — Xj. Here by convention, (Yj) 
= (Xj) if Xj = and (Yj) is the the zero sequence if Xj = 1. Then Pi is a size-biased 
pick from (Pj), P2 is a size-biased pick from the sequence derived from (Pj) by deletion 
of Pi and renormalization, and so on. For this reason, (Pj) is said to be a size-biased 
permutation of (Pj^). 



The two-parameter family It was shown in [20] that for each pair of real parameters 
(a, 9) with 

< a < 1, B > -a (5) 

the formula 

YtZl{B±ioc) ' 

)n-l 
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where k = k\, n = n x , and 



i \ , -,\ , t \ T(x + n) 

(x) n := x(x + 1) . . . (x + n - I) = — , — 

r(x) 

is a rising factorial, defines the EPPF of an exchangeable random partition of positive 
integers whose block frequencies (Pi) in order of appearance admit the stick-breaking 
representation 

i-1 

Pi = W i ]l(l-W j ) (7) 

3=1 

for random variables Wj such that 



Wi, Wii ■ ■ ■ are mutually independent (8) 

with 

W k = (3i- a ,e + ka (9) 
where = indicates equality in distribution, and f3 a ,b for a, b > denotes a random variable 



with the beta(a, b) density 



a,6 e du) = ^±^ u a - l (l - uf-'du (0<U<1) 
T(a)T(b) 



(10) 



which is also characterized by the moments 



- p a , b y} = ( {a) ; { ^ (ij = o, i, 2, . . .). (ii) 

[a -+- o)i + j 

Formula ([6]) also defines an EPPF for (a, 9) in the range 

a< 0, 6 = -Ma for some M G N, (12) 

in which case the stick-breaking representation with factors as in makes sense 
for 1 < k < M, with the last factor Wm = 1- The frequencies (P 1; . . . , Pm) in this 
case are a size-biased random permutation of (Qi, . . . , Qm) with the symmetric Dirichlet 
distribution with M parameters equal to v :— —a > 0. It is well known that the Qi 
can be constructed as Qi = jufE,l < i < M, where S = YliLi 1" anc ^ ^ e 7^ are 
independent and identically distributed copies of a gamma variable ^ v with density 

P(7„ G dx) = r(z/)" 1 x ,/ ~ 1 e- :!; dx (x > 0). (13) 

As shown by Kingman [1 3 j , the (0,0) EPPF (E]) for a = 0,8 > arises in the limit of 
random sampling from such symmetric Dirichlet frequencies as v = — a j and M j oo 
with i/M = held fixed. In this case, the distribution of the partition n n is that determined 
by the Ewens sampling formula with parameter 6, the residual fractions Wi in the stick- 
breaking representation are identically distributed like Pi t g, and the ranked frequencies 
P/ can be obtained by normalization of the jumps of a gamma process with stationary 
independent increments (7„,0 < v < 9). Perman, Pitman and Yor [T7] gave extensions 



6 



of this description to the case < a < 1 when the distribution of ranked frequencies 
can be derived from the jumps of a stable subordinator of index a. See also [2H [18j [19] 
for further discussion and applications to the description of ranked lengths of excursion 
intervals of Brownian motion and Bessel processes. 

In the limit case when v = —a — > oo and 9 = Mv — > oo, for a fixed positive integer 
M, the EPPF © converges to 

... M(M -1)---{M -k + 1) .... 
PmW ■= — ^ " , (14) 

corresponding to sampling from M equal frequencies 

P 1 = P 2 = ... = Pm = i/m 

as in the classical coupon collector's problem with some fixed number M of equally fre- 
quent types of coupon. We refer to the collection of partition structures defined by (jHJ) 
for the parameter ranges (jSJ) and (fl2"|) . as well as the limit cases (JHj), as the extended 
two-parameter family. 

The partition of N into singletons and the partition 1 of N into a single block both 
belong to the closure of the two-parameter family. As noticed by Kerov [11] , a mixture of 
these two trivial partitions with mixing proportions t and 1 — t also belongs to the closure, 
as is seen from (EJ) by letting a — > 1 and 9 — > — 1 in such a way that (1 — a)/ {9 + 1) — > t 
and (0 + a)/(0 + 1) -»• 1 - t. 

Characterizations by deletion properties The main focus of this paper is the fol- 
lowing result, which was first conjectured by Pitman [20J. For convenience in presenting 
this result, we impose the following mild regularity condition on the EPPF p associated 
with a partition structure (7r n ): 

p{2, 2, 1) > and lim pin) = 0. (15) 

n— >oo 

Equivalently, in terms of the frequencies in order of appearance, 

P(0 < Pi < Pi + P 2 < 1) > and P(Pi = 1) = 0, (16) 

or again, in terms of the ranked frequencies P/, 

P(0 < P 2 j , P/ + Pi < 1) > and P(P/ = 1) = 0. (17) 

Note that this regularity condition does not rule out the case of improper frequencies. 
See Section [5] for discussion of how the following results can be modified to accomodate 
partition structures not satisfying the regularity condition. 

Theorem 3 Among all partition structures (ir n ) with EPPF p subject to ffl5l) . the 
extended two-parameter family is characterized by the following property: 

if one of n balls is chosen uniformly at random, independently of a random 
coloring of these balls according to 7c n , then given the number of other balls of 
the same color as the chosen ball is m — 1, for some 1 < m < n, the coloring of 
the remaining n — m balls is distributed according to Tr' n _ m for some sequence 
of partitions (tt[, ir' 2 , . . .) which does not depend on n. 
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Moreover, if (ir n ) has the (a, 9) EPPF (jgj), then «) has the (a, 8 + a) EPPF ©, whereas 
if (7Tn) ^ e EPPF (fl4"|) /or some M ; i/ien £/ie EPPF of (ir' n ) has the same form except 
with M decremented by 1. 

Note that it is not assumed as part of the property that (jr' n ) is a partition structure. 
Rather, this is implied by the conclusion. Our formulation of Theorem [3] was inspired by 
Kingman [T3] who assumed also that 7t f n — ir n for all n. The conclusion then holds with 
a = 0, in which case the distribution of n n is that determined by the Ewens sampling 
formula from population genetics. 

In Section 0] we offer a proof of Theorem [3] by purely combinatorial methods. Some 
preliminary results which we develop in Section [3] allow Theorem [3] to be reformulated in 
terms of frequencies as in the following Corollary: 

Corollary 4 Let the asymptotic frequencies (Pi) of an exchangeable random partition 
of positive integers U be represented in the stick- breaking form ((7|) for some sequence of 
random variables W\ < 1, W2, ■ ■ ■■ The condition 

Wi is independent of (W 2 , W 3 , . . .) (18) 

obtains if and only if 

the Wi are mutually independent. (19) 
If in addition to ( li#j) the regularity condition ffTBT) holds, then II is governed by the extended 

two-parameter family, either with Wi — ft\- a ,8+ia, or with Wi = 1/(M — % + 1) for 1 < 
% < M, as in the limit case ( 114ft . for some M = 3, 4, . . .. 

The characterization of the two-parameter family using (fl9|) rather than the weaker con- 
dition (Tl8|) was provided by Pitman [21] . As we show in Section HI it is possible to derive 
( Tl9|) directly from (Tl8|) . without passing via Theorem [31 

The law of frequencies (Pj) defined by the stick-breaking scheme ([7]) for indepen- 
dent factors Wi with Wi — (3i- a fi + i a is known as the the two-parameter Griffiths-Engen- 
McCloskey distribution, denoted GEM(a,0). The property of the independence of resid- 
ual proportions Wi, also known as complete neutrality, has also been studied extensively 
in connection with finite-dimensional Dirichlet distributions [3]. 

The above results can also be expressed in terms of ranked frequencies. Recall that the 
distribution of ranked frequencies (P^) of an (a, #)-partition is known as the two-parameter 
Poisson-Dirichlet distribution PD(a, 9). According the the previous discussion, a random 
sequence (P^) with PD(a,6 ) ) distribution is obtained by ranking a sequence (Pj) with 
GEM(a,6 ) ) distribution. The PD(a,#) distribution was systematically studied in [2"4"j . 
and has found numerous further applications to random trees and associated processes of 
fragmentation and coagulation [23[ I9l [2]. 

Corollary 5 Let (P^) be a decreasing sequence of ranked frequencies subject to the reg- 
ularity condition (T5]) and ( TTT1) . For Pj, a size-biased pick from (P^), /ei (Q^,) be derived 
from (P^) 6y deletion of Pj and renormalization. The random variable Pj is independent 
of the sequence (Q^) if and only if either the distribution of (P^) is PD(a, 9) for some 
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(a, 9), or Pf, = 1/M for all 1 < k < M, for some M > 3. In the former case, the distri- 
bution of (Ql) is PD(a, 9 + a), whereas in the latter case, the deletion and renormalization 
simply decrements M by one. 

The 'if part of this Corollary is Proposition 34 of Pitman- Yor [24J, while the 'only if 
part follows easily from Corollary m using Kingman's paintbox representation. 



3 Partially Exchangeable Partitions 

We start by recalling from [20J some basic properties of partially exchangable partitions 
of positive integers, which are consistent sequences II = (Il n ), where Il n is a partition 
of [n] whose probability distribution is of the form ([1]) for some function p = p(X) of 
compositions A of positive integers. The consistency of Il n as n varies amounts to the 
addition rule 

fe+i 

P(A) = l>(A (i) ), (20) 

3=1 

where k = k\ is the number of parts of A, and A^ is the composition of n\ + 1 derived 
from A by incrementing Xj to Xj + 1, and leaving all other components of A fixed. In 
particular, for j = k\ + 1 this means appending a 1 to A. There is also the normalization 
condition p(l) = 1. To illustrate ([201) for A = (3, 1, 2): 

p(3, 1, 2) = p(4, 1, 2) + p(3, 2, 2) + p(3, 1, 3) + p(3, 1, 2, 1). 

The following proposition recalls the analog of Kingman's representation for partially 
exchangeable partitions: 

Proposition 6 (Corollary 7 from [20J) Every partially exchangeable partition of pos- 
itive integers II is such that for each k > 1, the kth block has an almost sure limit 
frequency The partition probability function p can then be presented as 



p(A) = E 



k fc-1 
3=1 



i=l 



(21) 



where k — k\ and Rj := (1 — P± — ■ ■ —Pj). Alternatively, in terms of the residual fractions 
Wk in the stick-breaking representation <£7§: 



p(A) = E 



'A, -i 



i=l 



[22] 



where W% := 1 — Wi, Aj := 5^ f>J -Ai. This formula sets up a correspondence between 
the probability distribution of U, encoded by the partition probability function p, and an 
arbitrary joint distribution of a sequence of random variables (Wi, W2, ■ ■ ■) with < Wi < 
1 for all i. 
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In terms of randomly coloring a row of ri\ balls, the product whose expectation appears 
in (|22|) is the conditional probability given W\, W2, ... of the event that the first Ai balls 
are colored one color, the next A2 balls another color, and so on. So (|22|) reflects the 
fact that conditionally given Wi, W 2 , ■ ■ ■ the process of random coloring of integers occurs 
according to the following residual allocation scheme [2U1 Construction 16]: 

Ball 1 is painted a first color, and so is each subsequent ball according to a 
sequence of independent trials with probability W\ of painting with color 1. 
The set of balls so painted defines the first block B\ of II. Conditionally given 
£>!, the first unpainted ball is painted a second color, and so is each subsequent 
unpainted ball according to a sequence of independent trials with probability 
W2 of painting with color 2. The balls colored 2 define £>2, and so on. Given an 
arbitrary sequence of random variables (Wk) with < Wk < 1, this coloring 
scheme shows how to construct a partially exchangeable partition of N whose 
asymptotic block frequencies are given by the stick-breaking scheme (17j). 

Note that the residual allocation scheme terminates at the first k, if any, such that Wk = 1, 
by painting all remaining balls color k. The values of Wi for i larger than such a k have 
no effect on the construction of II, so cannot be recovered from its almost sure limit 
frequencies. To ensure that a unique joint distribution of (Wi, W 2 , . . .) is associated with 
each p, the convention may be adopted that the sequence (Wi) terminates at the first k 
if any such that Wk = 1. This convention will be adopted in the following discussion. 
For Wi which are independent, formula (|22|) factorizes as 

k 

p(X) = l[E(W^- l W^ +1 )- (23) 
i=i 

In particular, for independent Wi with the beta distributions (Q, this formula is readily 
evaluated using (fTTj) to obtain (jSJ). Inspection of (jSJ) shows that this function of composi- 
tions A is a symmetric function of its parts. Hence the associated random partition II is 
exchangeable. 

There is an alternate sequential construction of the two-parameter family of partitions 
which has become known as the "Chinese Restaurant Process" (see [19], Chapter 3). In- 
stead of coloring rows of balls, imagine customers entering a restaurant with an unlimited 
number of tables. Initially customer 1 sits at table 1. At stage n, if there are k occupied 
tables, the ith of them occupied by Aj customers for 1 < i < k, customer n + 1 sits at 
one of the previously occupied tables with probability (Aj — ct)/(n + 9), and occupies a 
new table k + 1 with probability (9 + ka) /(n + 6). It is then readily checked that for each 
partition of [n] into blocks Bi with \B^\ = Aj, after n customers labeled by [n] have entered 
the restaurant, the probability that those customers labeled by Bi sat at table i for each 
1 < i < k\ is given by the product formula Moreover, the stick-breaking description 
of the limit frequencies P, is readily derived from the Polya urn-scheme description of 
exchangeable trials which given a beta(a, 6)-distributed variable S, are independent with 
success probability S. 

Continuing the consideration of a partially exchangeable partition II of positive inte- 
gers, we record the following Lemma. 
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Lemma 7 Let IT be a partially exchangeable random partition ofN with partition prob- 
ability function p, and with blocks B±, B2, ■ ■ ■ and residual frequencies W\, W2, ■ ■ ■ such that 
W\ < 1 almost surely. Let IT denote the partition of N derived from U by deletion of the 
block £>i containing 1 and re-labeling ofN — Bi by the increasing bijection with N. Then 
the following hold: 

(i) The partition IT is partially exchangeable, with partition probability function 

P'(A 2 , . . . , X k ) = jr ( Al + • • • + 2 W A 2 , . • • , A*) (24) 

Ai=l ^ 1 ' 

and residual frequencies W 2 , W 3 , .... 

(ii) If II is exchangeable, then so is IT. 

(iii) For 1 < m < n 

q{n : m) := ¥{B X n [n\ = [m]) = EiW^W^'" 1 ) , (25) 
and there is the addition rule 

q{n : m) = q(n + 1 : m + 1) + q(n + 1 : m). (26) 



(iv) Let T n := inf{m : | [n + m] \ B\ \ = n} which is the number of balls of the first color 
preceding the nth ball not of the first color. Then 

)q(n + m:m), (27) 
m — 1 J 

and consequently 

E fm + n — 2 \ . , , . 

( 1 jq(n + m:m) = l. (28) 

m=l ' 

Proof. Formula (|2"5"1) is read from the general construction of B\ given W\ by assigning 
each i > 2 to B\ independently with the same probability W±. The formulas fl24]) and (1271) 
are then seen to be marginalizations of the following expression for the joint distribution 
of T n and 11^, the restriction of II' to [n\. 

, )q(n + m:m)p(m,\C 1 \,...,\C k -i\) (29) 
m — 1 / 

for every partition (C±, . . . , Ck~i) of [n]. To check (129|) . observe that the event in question 
occurs if and only if H n + m = (-Si, • • • ? -Bfc) for some blocks Bi with |5x| = m and = 
|Cj_i| for 2 < i < k. Once Si is chosen, each Bi for 2 < i < k is the image of Cj_i via the 
increasing bijection from [n] to [n + m] \ B±. For prescribed Cj_i,2 < i < k, the choice 
of Bi C [n + wi] is arbitrary subject to the constraint that 1 G Bj and n + m £ B\. The 
number of choices is the binomial coefficient in (1291) . so the conclusion is evident. □ 

The connection between Theorem [3] and Corollary |4] is established by the following 
Lemma: 
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Lemma 8 Let U be a partially exchangeable partition of N with residual frequencies Wi 
such that ¥(Wi < 1) = 1, with the convention that the sequence terminates at the first k 
(if any) such that Wk = 1, so the joint distribution of (Wi) is determined uniquely by the 
partition probability function p of IT, and vice versa, according to formula (122]) . For B\ 
the first block of II with frequency W\, let IT be derived from U by deleting block B\ and 
relabeling the remaining elements as in Lemma The following four conditions on II are 
equivalent: 

(i) W\ is independent of (W 2 , W 3 , . . .). 

(ii) The partition probability function p of U admits a factorization of the following 
form, for all compositions X of positive integers with k > 2 parts: 



for some non-negative functions q(n : m) and p'(\2, ■ ■ ■ , A&). 

(iii) For each 1 < m < n, the conditional distribution of U' n _ m given \B\ D [n]| = m 
depends only on n — m. 

(iv) The random set B\ is independent of the random partition II' of N. 

Finally, if these conditions hold, then (ii) holds in particular for q(n : m) as in ( |25|) and 
p'(X 2 , . • • , Afc) the partition probability function ofW. 

Proof. That (i) implies (ii) is immediate by combination of the moment formula (|22|) . fpMl) 
and (J25l) . Conversely, if (ii) holds for some q(n : m) and p'(\2, ■ ■ ■ , A&), Lemma [7] implies 
easily that (ii) holds for q and p' as in that Lemma. So (ii) gives a formula of the form 



where g ranges over a collection of bounded measurable functions whose expectations de- 
termine the law of W%, W3, . . ., and for the g associated with X 2 , . . . , A&, the function f(w) 
ranges over the polynomials w m ~ 1 (l — w) n where m = Ai G N and n = n\ — Xi = X^=2 
But linear combinations of these polynomials can be used to uniformly approximate any 
bounded continuous function of w on [0, 1] which vanishes in a neighbourhood of 1. It 
follows that fETTj) holds for all such /, for each g, hence the full independence condition 
(i). Lastly, the equivalence of (ii), (iii) and (iv) is easily verified. □ 

4 Exchangeable Partitions 

For a block B of a random partition Il n of [n] with \B\ = m, let II n \i? denote the partition 
of \n — m] obtained by first deleting the block B of Il n , then mapping the restriction of 
II n to [n] \ B to a partition of [n — m] via the increasing bijection between [n] \ B and 
[n — m\. In terms of a coloring of n balls in a row, this means deleting all m balls of some 
color, then closing up the gaps between remaining balls, to obtain a coloring of n — m 
balls in a row. Theorem [3] can be formulated a little more sharply as follows: 



p(X) = q(n x : X 1 )p'(X 2 , . . . , X k ) 



(30) 



E[f(W 1 )g(W 2 , W,, ...)]= E[f(W l )]E[g(W 2 , W 3 , . . .)] 



(31) 
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Theorem 9 Among all exchangeable partitions (IT n ) of positive integers with EPPF 
p subject to (|T5j) . the extended two-parameter family is characterized by the following 
property: 

if B n \ denotes the random block ofU n containing 1, then for each 1 < m < n, 
conditionally given B n \ with \B n -\\ = m, the partition II n \ B n \ has the same 
distribution as H' n _ m for some sequence of partitions U[, U' 2 , . . . which does not 
depend on n. 

Moreover, if (U n ) is an (a, 8) partition, then we can take for (U' n ) the exchangeable (a,6 + 
a) partition ofN. 

For an arbitrary partition U n of [n] with blocks listed in the order of appearance, define 
J n as the index of the block containing an element chosen from [n] uniformly at random, 
independently of Il n . We call the block B n j n a size-biased pick from the sequence of blocks. 
Note that this definition agrees with (jl]) in the sense that the number \B n j n \/n is a size- 
biased pick from the numerical sequence (\B n j\/n, j = 1,2,...), because given a sequence 
of blocks of partition fl^ the value J n = j is taken with probability \B n j\/n. Assuming Il n 
exchangeable, the size of the block \B n i\ has the same distribution as |i3 n jj conditionally 
given the ranked sequence of block-sizes, and the reduced partitions U n \B n i and Ii n \B n j n 
also have the same distributions. The equivalence of Theorem [3] and Theorem [9] is evident 
from these considerations. 

We turn to the proof of Theorem [9j The condition considered in Theorem [9] is just that 
considered in Lemma [H^iii), so we can work with the equivalent factorization condition 
(I30p . We now invoke the symmetry of the EPPF for an exchangeable II. Suppose that 
an EPPF p admits the factorization fl30l) . and re- write the identity (130]) in the form 

q{\\\ + m : m) 

P{ m , A) = 7TTT— : n P (1, A). 
q{\\\ + 1:1) 

For this expression we must have non-zero denominator, but this is assured by P(0 < 
W\ < 1) > 0, which is implied by the regularity condition (1151) . Instead of part m in 
p(m, A), we have now 1 in p(l, A). But p is symmetric, hence we can iterate, eventually 
reducing each part to 1. 

Let A = (Ai, . . . , Afc) be a generic composition, and denote Aj — Xj + h A& the tail 

sums, thus Ai = |A|. Iteration yields 

g(Ai:Ai) g(l + A 2 :A 2 ) q(k - 2 + A fc _x : A fc _x) q(k - 1 + A fc : A fe ) k 

P[) ?(1 + A a :l) g(2 + A 3 :l) "' q(k — l + A k : 1) q(k : 1) KJ ' 

(32) 

where p(l k ) is the probability of the singleton partition of [k]. This leads to the following 
lemma, which is a simplification of [2TI Lemma 12]: 

Lemma 10 Suppose that an EPPF p satisfies the factorization condition ( l30i) and the 
regularity condition ( TT5l) . Then 
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(i) either 




for some a, b > 0, corresponding to W\ with beta(a, b) distribution, 
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or 



q{n : m) = c" 1 "^! - c) 



n—m 



for some < c < 1, corresponding to W\ = c, in which case necessarily c = 1/M 
for some M > 3. 

Proof. By symmetry and the assumption that p(2, 2, 1) > 0, it is easily seen from King- 
man's paintbox representation that for each m = 1,2,... there is some composition /i of 
m such that 



where for instance (3, 2, \i) means the composition of 5 + m obtained by concatenation of 
(3, 2) and \i. Indeed, it is clear that one can take either fi = l m or \i to be a single part of 
size m, according to whether the probability of at least three non-zero frequencies is zero 
or greater than zero. Applying (I3"2"|) for suitable k > 3 with p(l k ) > 0, and cancelling some 
common factors of the form q(n', m'), which are all strictly positive because p(2, 2, 1) > 
implies P(0 < W\ < 1) > 0, we see that for every m = 1, 2, . . . 



We have by the addition rule (12T)I) 

q(m + l : 2) = q(m : 1) — 5(771+ 1 : 1), q(m+2 : 3) = q(m : 1)— 2q(m+l : l) + g(m+2 : 1), 
and introducing variables x m = qim : 1), n = m + 2 



The recursion is homogeneous, to pass to inhomogeneous variables divide both sides of 
the equality by x n , then set y n := x n+ i/x n and rewrite as 



p(3,2,/i)=p(2,3,/x)>0, 



q{m + 5 : 3)g(m + 3:2) q(m + 5 : 2)q(m + 4:3) 
q{m + 3:1) q(m + 4:1) 



(33) 



\ x n+l ~ 2x n+ 2 + x n+z){ x n ~ x n+l) _ (^n+2 — %n+3){ x n ~ 2x n +\ + ^n+2) 



x n+2 



(1 - 2y n+x + y n+2 y n +i)(l - y n ) = (1 - 2/«+a)(l - 2y n + y n y n +i), 



which simplifies as 



-27/ n+ l + yn+Wn+2 + VnVn+l = -Vn - Vn+2 + ^VnVn^- 



Finally, use substitution 



Vn = 




to arrive at 




0. 



^n^n+l^n+2 
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From this, z n is a linear function of n, which must be nondecreasing to agree with < 

Vn < 1- 

If z n is not constant, then going back to x n 's we obtain 

q(n : 1) = c , n > 3, 

(a + 

for some a, 6, c , where the factor c appears since the relation (1331) is homogeneous. It is 
seen from the moments representation 

q(n : 1) = / (1 - x) n_1 P(Pi G dx), n > 3, 

J [0,1] 

that when a, b are fixed, the factor cq is determined from the normalization by choosing 
a value of P(Pi = 1). The condition p(n) — > means that P(Pi = 1) = 0, in which case 
c = 1 and the distribution of Pi is beta(a, 6) with some positive a, 6. 

If (z n , n > 3) is a constant sequence, then g(n : 1) is a geometric progression, and 
a similar argument shows that the case (ii) prevails. That c = 1/M for some M > 3 is 
quite obvious: the only way that a size-biased choice of a frequency can be constant is if 
there are M equal frequencies for some M > 1. The regularity assumption (|T5|) rules out 
the cases M = 1,2. 

□ 



Proof of Theorem [9] In the case (i) of Lemma [TDJ, substituting in ( J32l) yields 

KA) _ (a^-iQUa (a + &)a 2 (a)A 2 -i(6)A 3 +i (a + &)a 3 +i (a) Afc -i(fe)fc-i (a + %-! 
p(l fc ) (a + 6) Al -i (6)a 2 (a + 6) A2 (6)a 3 +i (a + 6)A fe +fc-2 (6)fc-i 

provided p(l fc ) > 0. After cancellation this becomes 



p(X) (a + 6) fc _i 



Mi fc ) (a+^n 1 

where n = Ai = Ai + . . . + = |A|. Specializing, 

p(2, I*-'" 1 ) a 



oJa,-i, 



p(l fc ) a + 6 + A;-l 

and using the addition rule (|2DI) 

P (i fc ) = M2,i /£ - 1 )+p(i fe+1 ), 

we obtain the recursion 

p(l fc+1 ) _ a + & + fc(l-a)-l 
p(l k ) ~ a + b + k-1 ' 

Now ([6]) follows readily by re-parametrisation 9 = a + b — 1, a = 1 — a. 

The case (ii) of Lemma [TU1 is even simpler, as it is immediate that W\ = 1/M implies 
that the partition is generated as if by coupon collecting with M equally frequent coupons. 
□ 
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Proof of Corollary HI As observed earlier, Corollary H] characterizing the extended 
two-parameter family by the condition that 

Wi and (W 2 , W 3 , . . .) are independent (34) 

can be read from Theorem [9] and Lemma [H We find it interesting nonetheless to provide 
another proof of Corollary H] based on analysis of the limit frequencies rather than the 
EPPF. This was in fact the first argument we found, without which we might not have 
persisted with the algebraic approach of the previous section. 

Suppose then that W%, W 2 , W 3 , ... is the sequence of residual fractions associated with 
an EPPF p, and that fl34l) holds. The symmetry condition p(r + 1, s + 1) = p(s + 1, r + 1) 
and the moment formula fl22l) give 

E(W{Wl +1 )E(W*) = E(W°Wl +1 )E(WZ) (35) 

for non-negative integers r and s. Setting r = 0, this expresses moments of W 2 in terms 
of the moments of W\. So the distribution of W\ determines that of W 2 - Assume now 
the regularity condition (ITS"]) . According to Lemma [10] we are reduced either to the case 
with M equal frequencies with sum 1, or to the case where W\ has a beta distribution, 
and hence so does W 2 , by consideration of (135]) . There is nothing more to discuss in the 
first case, so we assume for the rest of this section that 

each of W\ and W 2 has a non-degenerate beta distribution, with possibly different parameters. 

(36) 

Recall that 

Pi = W x and P 2 = (1 - W X )W 2 . 

As observed in [2T] , 

the conditional distribution of (P3, P4, . . .) given Pi and P 2 depends symmetrically on P\ and P 2 . 

This can be seen from Kingman's paintbox representation, which implies that condition- 
ally given P x , P 2 , as well as Pi and P 2 , the sequence (P 3 ,P 4 , ...) is derived by a 
process of random sampling from the frequencies (P/) with the terms Pi and P 2 deleted. 
No matter what (P i ) this process depends symmetrically on Pi and P 2 , so the same is 
true without the extra conditioning on (P/). 

Since Pi + P 2 is a symmetric function of Pi and P 2 , and (W3, W4, . . .) is a measurable 
function of Pi + P 2 and (P 3 , P 4 , . . .), 

the conditional distribution of W3, W4, . . . given (Pi, P 2 ) depends symmetrically on Pi and P 2 . 
The condition that Wi is independent of (W 2 , W 3 , W4, . . .) implies easily that 

Wi is conditionally independent of (W 3 , W4, . . .) given W 2 . 
Otherwise put: 

Pi is conditionally independent of (W3, W4, . . .) given P 2 /(l — Pi), 
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hence by the symmetry discussed above 

P 2 is conditionally independent of (W3, W4, . . .) given Pi/ (1 — P 2 ). 
Let X := P 2 /(l - Pi), Y := Pi/(1 - P 2 ) and Z := (W 3 , W A , . . .). Then we have both 

X is conditionally independent of Z given Y, (37) 

and 

K is conditionally independent of Z given X, (38) 
from which it follows under suitable regularity conditions (see Lemma [TT] below) that 

(X, Y) is independent of Z, (39) 

meaning in the present context that 

Wi, W 2 and (W 3 , W 4 , . . .) are independent. (40) 

Lauritzen [T5J Proposition 3.1] shows that (1571) and (13"8"|) imply (1391) under the assumption 
that (X, Y, Z) has a positive and continuous joint density relative to a product measure. 
From (136|) and strict positivity of the beta densities on (0, 1), we see that (X,Y) has a 
strictly positive and continuous density relative to Lebesgue measure on (0, l) 2 . We are 
not in a position to assume that (X, Y, Z) has a density relative to a product measure. 
However, the passage from ( 1371) and ( 1381) to ( 1391) is justified by Lemma 1TT1 below without 
need for a trivariate density. So we deduce that ( 1401) holds. By Lemma EJ (W2, W3, . . .) 
is the sequence of residual fractions of an exchangeable partition II', and W 2 has a beta 
density. So either W3 = 1 and we are in the case (1T2|) with M = 3, or W3 has a beta 
density, and the previous argument applies to show that 

W%, W 2 , W 3 and (W 4 , W 5 , . . .) are independent. 

Continue by induction to conclude the independence of Wi, W 2 , . . . Wk for all k such that 
p(l k )>0. □ 

Lemma 11 Let X, Y and Z denote random variables with values in arbitrary measur- 
able spaces, all defined on a common probability space, such that ( 1371) and (1381) hold. If 
the joint distribution of the pair (X, Y) has a strictly positive probability density relative 
to some product probability measure, then ( 1391) holds. 

Proof. Let p(X, Y) be a version of P(Z e B | X, Y) for B a measurable set in the 
range of Z. By standard measure theory (e.g. Kallenberg [TUl 6.8]) the first conditional 
independence assumption gives P(Z G B \ X, Y) = F(Z G B \ X) a.s. so that 

p(X, Y) = g(X) a.s. for some measurable function g. 

Similarly from the second conditional independence assumption, 

p(X, Y) = h(Y) a.s. for some measurable function h, 
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and we wish to conclude that 

p(X, Y) = c a.s. for some constant c. 

To complete the argument it suffices to draw this conclusion from the above two assump- 
tions about a jointly measurable function p, with (X, Y) the identity map on the product 
space of pairs X x y, and the two almost sure equalities holding with respect to some 
probability measure P on this space, with P having a strictly positive density relative to 
a product probability measure /x <S> v. Fix u G (0, 1), from the previous assumptions it 
follows that 

{p(X, Y)>u} = {Xe A u } = {Ye C u } a.s. (41) 
for some measurable sets A u , C u , whence 

{p(X, Y) > u} = {X G A,} l~l {Y G C u } a.s., (42) 

where the almost sure equalities hold both with respect to the joint distribution P of 
(X,Y), and with respect to a product probability measure \x <8> v governing (X,Y). But 
under /x® v the random variables X and Y are independent. So if q := (/x Cg> z/) (p(X, K) > 
it), then ( 14TT) and (1411 imply that q = q 2 , so g = or q = 1. Thus p(X, F) is constant a.s. 
with respect to /i <g> u, hence also constant with respect to P. □ 

5 The deletion property without the regularity con- 
dition 

Observe that the property required in Theorem [3] is void if 7r n happens to be the one- 
block partition (n). This readily implies that mixing with the trivial one-block partition 
1 does not destroy the property. Therefore the 1-component may be excluded from the 
consideration, meaning that it is enough to focus on the case 

Pi < 1 a.s., or equivalently P/ < 1 a.s., or equivalently lim n ^ 00 p(rx) = 0. (43) 

Suppose then that this condition holds, but that the first condition in ffl5|) does not 
hold, so that p(2, 2, 1) = 0. Then 

P(P 2 ^ = 1 - pi > 0) + P(P/ < 1, P 2 ; = 0) = 1. 

If both terms have positive probability then P(W 2 = 1 1 Wi = 0) = but F(W 2 = 1\ W%> 
0) > 0, so the independence of W\ and W 2 fails. Thus the independence forces either 
P(P 2 j = 1 - P/ > 0) = 1 or P(P/ < 1, P 2 l = 0) = 1. The two cases are readily treated: 

(i) If P(P 2 = 1 — Pj > 0) = 1 then W 2 = 1 a.s. and the independence trivially holds. 
This is the case when II has two blocks almost surely. 

(ii) If P(P/ < 1, P% = 0) = 1 and P(P/ > 0) > then ¥(W 2 > | Wi > 0) = 
but F(W 2 > | W\ = 0) > 0, hence W\ and W 2 are not independent. Therefore 
P(P/ < 1, P 2 = 0) = 1 and the independence imply P x = a.s., meaning that 

n = o. 
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We conclude that the most general exchangeable partition II which has the property in 
Theorem [9] is a two-component mixture, in which the first component is either a partition 
from the extended two-parameter family, or a two-block partition as in (i) above, or 0, 
and the second component is the trivial partition 1. 

6 Regeneration and r-deletion 

In this section we partly survey and partly extend the results from [5j [6] concerning char- 
acterizations of (a, 6) partitions by regeneration properties. As in Kingman's study of the 
regenerative processes [12] . subordinators (increasing Levy processes) appear naturally in 
our framework of multiplicative regenerative phenomena. Following [6J, we call a partition 
structure (n n ) regenerative if 

for each n it is possible to delete a randomly chosen part of 7r„ in such a way 
that for each < m < n, given the deleted part is of size m, the remaining 
parts form a partition of n — m with the same distribution as vr n _ m . 

In terms of an exchangeable partition IT = (II n ) of N, the associated partition structure 
(ir n ) is regenerative if and only if 

for each n it is possible to select a random block B n j n of IT n in such a way 
that for each < m < n, conditionally given that |-B n jJ = m the partition 
n„ \ B n j n of [n — m] is distributed according to the unconditional distribution 

of hln— m- 

(II n \ B nJn given \B nJn \ = m) = n n _ m (44) 

where IT n \ B n j n is defined as in the discussion preceding Theorem [9j Moreover, there is 
no loss of generality in supposing further that the conditional distribution of J n given IT n 
is of the form 

P(J n = j | II n = {B u B k }) = ddB.l, \B k \;j) (45) 

for some symmetric deletion kernel d, meaning a non-negative function of a composition 
A of n and 1 < j < k\ such that 

d(Ai, A 2 , . . . , A fc ; j) = d(A CT( i), A CT ( 2 ), . . . , X a (k)] 1) (46) 

for every permutation a of [k] with a(l) = j. To determine a symmetric deletion kernel, 
is suffices to specify d(X; 1), which is the conditional probability, given blocks of sizes 
Ai, A2, . . • , Afc, of picking the first of these blocks. This is a non-negative symmetric func- 
tion of (A2, • • • , A/c), subject to the further constraint that its extension to arguments j 7^ 1 
via fl4"6"|) satisfies 

kx 

i=i 

for every composition A of n. The regeneration condition can now be reformulated in 
terms of the EPPF p of II in a manner similar to (1301) : 
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Lemma 12 An exchangeable random partition IT with EPPF p is regenerative if and 
only if there exists a symmetric deletion kernel d such that 



1 

n 



p(A)d(A;l) = g (n,Ai)^ A J p(A 2 ,...,A fc ) (47) 

for every composition A of n into at least two parts and some non-negative function q. 
Then 

q(n,m) = P(|S n ,jJ =m) (me [n]) (48) 

for J n as in (J15|) . 

Proof. Formula (j47p offers two different ways of computing the probability of the event 
that IT n = {Bi, . . . , Bk\ and J n = 1 for an arbitrary partition {Bi, . . . , B^} of [n] with 
\Bi\ = Xi for i G [fc]: on the left side, by definition of the symmetric deletion kernel, and on 
the right side by conditioning on the event B ny j n = B\ and appealing to the regeneration 
property and exchangeability. □ 

Consider now the question of whether an (a, 9) partition with EPPF p = p a g as in 
is regenerative with respect to some deletion kernel. By the previous lemma and 

cancellation of common factors, the question is whether there exists a symmetric deletion 

kernel d(X;j) such that the value of 

,(n,A0=d(A; 1 )(") ' 1 - a ';--''' ) + 't" 1)O) <«> 
\XiJ [d + n-Xi) Xl 

is the same for all compositions A of n with k parts and a prescribed value of Ai. But it 
is easily checked that the formula 

_ e\j + a(n - Aj) , , 

- — 77 — 71 7TT l^Uj 

^(b 1 + a(/c — 1)) 

provides just such a symmetric deletion kernel. Note that the kernel depends on (a, 9) only 
through the ratio r := a /(a + 9), and that the kernel is non- negative for all compositions 
A only if both a and 9 are non-negative. 

To provide a more general context for this and later discussions, let {x\, . . . ,£&) be a 
fixed sequence of positive numbers with sum s = i x r F° r a fixed parameter r e [0, 1], 
define a random variable T with values in [A;] by 

P(r = ,•!(«, *)) = V^+ff"™ - (51 > 

s(l — r + r(« — 1)) 

The random variable a;^ is called a r -biased pick from xi, . . . , x^. The law of xt does not 
depend on the order of the sequence (xi, . . . ,£&), and there is also a scaling invariance: 
s~ l x T is a r-biased pick from (s~ 1 Xi, . . . , s -1 ^). Note that a 0-biased pick is a size-biased 
pick from [x\, . . . ,Xk), choosing any particular element with probability proportional to 
its size. A 1/2-biased pick is a uniform random choice from the list, as (!5T|) then equals 
1/k for all j. And a 1-biased pick may be called a co-size biased pick, as it chooses j with 
probability proportional to its co-size s — Xj. 
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These definitions are now applied to the sequence of block sizes Xj of the restriction 
to [n] of an an exchangeable partition II of N. We denote by T n a random variable whose 
conditional distribution given Il n with k blocks and \B n j\ = Xj for j e [k] is defined by 
( I5T]) . and denote by B n x n the r-biased pick from the sequence of blocks of IT n . We call II 
t -regenerative if n n is regenerative with respect to deletion of the r-biased pick B n T n - 

Theorem 13 [SJ E] For each r e [0, 1], apart from the constant partitions and 1, 
the only exchangeable partitions of N that are t- regenerative are the members of the two 
parameter family with parameters in the range 

{(a, 9) e [0, 1] x [0, oo] : a /(at + 9) = r}. 

Explicitly, the distribution of the r-biased pick for such (a, 9) partitions of [n] is 

to/ik i \ ( n \ (l-a)m-i (n-m)a + m9 

n\BnT n \ = m) = I — — r , m e [n . (52) 

\m/{u + n — m) m n 

Proof. The preceding discussion around ( 1491) and (1501) shows that members of the two 
parameter family with parameters in the indicated range are r-regenerative, and gives 
the formula (152]) for the decrement matrix. See (6] for the proof that these are the only 
non-degenerate exchangeable partitions of N that are r-regenerative. □ 

In particular, each (a, a) partition is 1/2-regenerative, meaning regenerative with 
respect to deletion of a block chosen uniformly at random. The constant partitions 
and 1 are obviously r regenerative for every r G [0,1]. This is consistent with the 
characterization above because the (1,9) partition is the partition for every 9 > 0, and 
because the partition 1 can be reached as a limit of (a, 9) partitions as a, 9 [ with 
a(a + 9)~ l held fixed. 



Multiplicative regeneration By Corollary [U if (Pj) is the sequence of limit frequen- 
cies for a (0, 9) partition for some 9 > and if the first limit frequency Pi is deleted 
and the other frequencies renormalized to sum to 1, then the resulting sequence (Qj) is 
independent of Pi and has the same distribution as (Pj). Because Pi is a size-biased pick 
from the sequence (p), this regenerative property of the frequencies (p) can be seen as 
an analogue of the 0-regeneration property of the (0, 9) partitions. 

If (Pj) is instead the sequence of limit frequencies of an (a, 9) partition II for parameters 
satisfying < a < 1, a/(a + 9) — t, a question arises: does the regenerative property 
of Il n with respect to a r-biased pick have an analogue in terms of a r-biased pick from 
the frequencies (P?,)? This cannot be answered straightforwardly as in the r = case, 
because when r > the formula (|5T|) defines a proper probability distribution only for 
series (xj) with some finite number k of positive terms. For instance, in the case r = 1/2 
there is no such analogue of ( 15TI) as 'uniform random choice' from infinitely many terms. 

Still, Ewens' case provides a clue if we turn to a bulk deletion. Let Pj be a size-biased 
pick from the frequencies (Pj), as defined by (j4j), and let (Qj) be a sequence obtained 
from (Pj) by deleting all P\,...,Pj and renormalizing. Then (Qj) is independent of 
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Pi, . . . ,Pj, and (Qj) = {Pj)- The latter assertion follows from the i.i.d. property of the 
residual fractions and by noting that (jl]) is identical with 

P(J = j | (Wi, i e N)) = ^ - Wi)- 

A similar bulk deletion property holds for partitions in the Ewens' family, in the form: 
(Il n \ {B nl U ■ • ■ U B nJn ) given \B X U • • • U £„j n | = m) = II n _ m 

for all 1 < m < n, where -B n j n is a size-biased pick from the blocks. 

To make the ansatz of bulk deletion work for r 7^ it is necessary to arrange the fre- 
quencies in a more complex manner. To start with, we modify the paintbox construction. 
Let IA C [0, 1] be a random open set canonically represented as the union of its disjoint 
open component intervals. We suppose that the Lebesgue measure of IA, equal to the sum 
of lengths of the components, is 1 almost surely. We associate with IA an exchangeable 
partition IT exactly as in Kingman's representation in Theorem |2j For each component 
interval G dlA there is an index iq '■= minjn : U n G G} that is the minimal index of a 
sequence (Ui) of iid uniform[0,l] points hitting the interval, and for all j, Pj is the length 
of the jth component interval when the intervals are listed in order of increasing minimal 
indices. So {Pj) is a size-biased permutation of the lengths of interval components of IA. 

Let < be the linear order on N induced by the interval order of the components of U, so 
j < k iff the interval of length Pj, which is the home interval of the j'th block Bj to appear 
in the process of uniform random sampling of intervals, lies to the left of the interval of 
length P k associated with block B k . A convergence argument shows that IA is uniquely 
determined by {Pj) and <. In loose terms, IA is an arrangement of a sequence of tiles of 
sizes Pj in the order on indices j prescribed by <, and this arrangement is constructable 
by sequentially placing the tile j in the position prescribed by the order < restricted to 

For x G [0, 1) let {a x , b x ) C IA be the component interval containing x. Define V x as 
the open set obtained by deleting the bulk of component intervals to the left of b x , then 
linearly rescaling the remaining set IA fl [b x , 1] to [0, 1]. We say that U is multiplicatively 

regenerative if for each x G [0, 1), V x is independent of U n [0, b x ] and V x = U. 
An ordered version of the paintbox correspondence yields: 

Theorem 14 [SI E] exchangeable partition II is regenerative if and only if it has a 
paintbox representation in terms of some multiplicatively regenerative setlA. The deletion 
operation is then defined by classifying n independent uniform points from [0, 1] according 
to the intervals of U into which they fall, and deleting the block of points in the leftmost 
occupied interval. 

A property of the frequencies {Pj) of an exchangeable regenerative partition II of N 
now emerges: there exists a strict total order < on N, which is a random order, which has 
some joint distribution with {Pj) such that arranging the intervals of sizes {Pj) in order < 
yields a multiplicatively regenerative set IA. Equivalently, there exists a multiplicatively 
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regenerative set IA that induces a partition with frequencies (Pj) and an associated order 
<. This set U is then necessarily unique in distribution as a random element of the space 
of open subsets of [0, 1] equipped with the Hausdorff metric [5] on the complementary 
closed subsets. A subtle point here is that the joint distribution of (Pj) and < is not 
unique, and neither is the joint distribution of (Pj) and U, unless further conditions are 
imposed. For instance, one way to generate < is to suppose that the (Pj) are generated 
by a process of uniform random from U. But for a (0, 9) partition, we know that another 
way is to construct U from (Pj) by simply placing the intervals in deterministic order 
Pi, P2, ■ ■ ■ from left to right. In the construction by uniform random sampling from U the 
interval of length Pi discovered by the first sample point need not be the leftmost, and 
need not lie to the left of the second discovered interval P2. 

In [5] we showed that the multiplicative regeneration of U follows from an apparently 
weaker property: if (au, bjj) is the component interval of U containing an uniform[0,l] 
sample U independent of U, and if V is defined as the open set obtained by deleting the 
component intervals to the left of bjj and linearly rescaling the remaining set U D [by, 1] 
to [0,1], then given by < 1, V is independent of by (hence, as we proved, independent 
of U fl [0,6jy] too!) and has distribution equal to the unconditional distribution of U. 
This independence is the desired analogue for more general regenerative partitions of the 
bulk-deletion property of Ewens' partitions. 

The fundamental representation of multiplicatively regenerative sets involves a random 
process F t known in statistics as a neutral-to-the right distribution function. 

Theorem 15 [5J ^4 random open set U of Lebesgue measure 1 is multiplicatively re- 
generative if and only if there exists a drift-free subordinator S = (S t ,t > 0) with S = 
such thatU is the complement to the closed range of the process F t = l—exp(—S t ), t > 0. 
The Levy measure of S is determined uniquely up to a positive factor. 

According to Theorems and regenerative partition structures with proper frequen- 
cies are parameterised by a measure u(du) on (0, 1] with finite first moment, which is the 
image via the transformation from s to 1 — exp(— s) of the Levy measure u(ds) on (0, 00] 
associated with the subordinator S. The Laplace exponent $ of the subordinator, defined 
by the Levy-Khintchine formula 

E[exp(-aSt)] = exp[-t$(a)], a > 

determines the Levy measure u(ds) on (0, 00] and its image v(du) on (0, 1] via the formulae 

$(a) = I (1 - e- ax )v(dx) = [ (1 - (1 -x) a )u(dx). 

i(0,oo] J]0,1] 

As shown in 0, the decrement matrix q of the regenerative partition structure, as in ff48l) . 
is then 

m) 

q(n, m) = — — — , 1 < m < n , n — 1,2, ... 

$(n) 

where 

^(n,m) = ( n )f x m (l-x) n ~ m v(dx). 
W V]o,i] 
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Uniqueness of the parameterisation is achieved by a normalisation condition, such as 
$(1) = 1. 

In [5] the subordinator S a,e which produces U as in Theorem [T5l for the (a, 9) partition 
was identified by the following formula for the right tail of its Levy measure: 

v(x, oo] = (1 - e- x Y a e~ x \ x > 0. (53) 

The subordinator S^ ' 9 ^ is a compound Poisson process whose jumps are exponentially 
distributed with rate 9. For 9 = the Levy measure has a unit mass at oo, so the 
subordinator S^ a ' ^ is killed at unit rate. The S 1 ^'") subordinator belongs to the class of 
Lamperti-stable processes recently studied in [I] . For positive parameters the subordina- 
tor S^ 01 ' 8 ^ can be constructed from the (0, 9) and (a, 0) cases, as follows. First split R+ 
by the range of S^°' e \ that is at points E\ < E% < . . . of a Poisson process with rate 
9. Then run an independent copy of S^ a ' ^ up to the moment the process crosses E\ at 
some random time, say t±. The level-overshooting value is neglected and the process is 
stopped. At the same time t\ a new independent copy of S^ a ^ is started at value E\ and 
run until crossing E<i at some random time t2, and so on. 

In terms of Ft = 1 — exp(-St), the range of the process in the {0,9) case is a stick- 
breaking set {1 — ntill — = 0, 1, . . .} with i.i.d. beta(l,#) factors V*. In the case 
(a, 0) the range of (F t ) is the intersection of [0, 1] with the a-stable set (the range of 
a-stable subordinator). In other cases U is constructable as a cross-breed of the cases 
(9,0) and (0, a): first [0, 1] is partitioned in subintervals by the beta(l, 9) stick-breaking, 
then each subinterval (a, b) of this partition is further split by independent copy of the 
multiplicatively regenerative (a, 0) set, shifted to start at a and truncated at b. 

Constructing the order Following [6l [25], we shall describe an arrangement which 
allows us to pass from (a, 9) frequencies (Pj) to the multiplicatively regenerative set 
associated with the subordinator S^ a ' a \ The connection between size-biased permutation 
with r-deletion (Lemma [T71) is new. 

A linear order < on N is conveniently described by a sequence of the initial ranks 
(pj) G [1] x [2] x • • • , with pj = i if and only if j is ranked ith smallest in the order 
< among the integers 1, . . . , j. For instance, the initial ranks 1, 2, 1, 3 . . . appear when 
3< 1<4<2. 

For £ G [0, oo] define a random order on N by assuming that the initial ranks 
Pk, k G N, are independent, with distribution 

npk = j) = ^^ (o < j < Q + k A_ 1 i u = k ) ,k>i. 

The edge cases £ = 0, oo are defined by continuity. The order <% is a 'uniformly random 
order', in the sense that restricting to [n] we have all n! orders equally likely, for every n. 
The order <oo coincides with the standard order < almost surely. For every permutation 
ii, . . . , i n of [n], we have 

P(il<S - <si " ) = £« + l). e « + n-l) 
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where r is the number of upper records in the permutation. See [8] for this and more 
general permutations with tilted record statistics. 



Theorem 16 [251 Corollary 7] For < a < 1, 9 > the arrangement of GEM(a,8) 
frequencies (Pj) represented as open intervals in an independent random order <g/ a is a 
multiplicatively regenerative open set U C [0,1], where U is representable as the comple- 
ment of the closed range of the process F t = 1 — exp(— St),t > 0, for S the subordinator 
with Levy measure f l53l) . 

This result was presented without proof as [25| Corollary 7], in a context where the 
regenerative ordering of frequencies was motivated by an application to a tree growth pro- 
cess. Here we offer a proof which exposes the combinatorial structure of the composition 
of size-biased permutation and a <e/ a ordering of frequencies. 

For a sequence of positive reals (x±, . . . ,x k ), define the r-biased permutation of this 
sequence, denoted perm T (xi, . . . , x k ), by iterating a single r-biased pick, as follows. A 
number xt is chosen from xi,...,Xk without replacement, with T distributed on [k] 
according to (!5T|) . and iy is placed in position 1. Then the next number is chosen from 
k — 1 remaining numbers using again the rule of r-biased pick, and placed in position 2, 
etc. 

The instance perm is the size-biased permutation, which is defined more widely for 
finite or infinite summable sequences (xi,Xz, ■ ■ ■), and shuffles them in the same way as 
it shuffles (s" 1 ^!, s~ 1 X2, ■ ■ ■) where s = Xj. Denote by <%(xi, . . . , Xk) the arrangement 
of xi, . . . , Xk in succession according to the o^-order on [k]. 

Lemma 17 For £ = (1 — r)/r there is the compositional formula 

perm T (xi, ...,x k ) = < f (perm (xi, . . . , x k )), (54) 
where on the right-hand side and perm are independent. 

Proof. On each side of this identity, the distribution of the random permutation remains 
the same if the sequence Xi, . . . , x k is permuted. So it suffices to check that each scheme 
returns the identity permutation with the same probability. If on the right hand side we 

set 

pernio (zi, ...,x k ) = (x CT (i), • • • , x^ k )) 
then the right hand scheme generates the identity permutation with probability 

^ (55) 



«{ + !)■••« + *- 1) 

where R is the number of upper records in the sequence of ranks which generated a -1 , 
which equals the number of upper records in a. Now R = X^Li -^-j where Xj is the 
indicator of the event Aj that j is an upper record level for a, meaning that there is some 
1 < i < n such that 

a(i') < j for all %' < i and a(i) = j. 
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Equivalently, Aj is the event that 



<j l (j) < a 1 (£) for each j < £ < k. 

Or again, assuming for simplicity that the x^ are all distinct, which involves no loss of 
generality, because the probability in question depends continuously on (x±, . . . ,Xk), Aj 
is the event that Xj precedes xi in the permutation (xo-(i), . . . ,x a (k)) for each j < £ < k. 
Now it is easily shown that (x a ^, . . . , x a ^)) with x\ deleted is a size-biased permutation 
of (x2, ■ ■ ■ , Xk), and that the same is true conditionally given A\. It follows by induction 
that the events Aj are mutually independent, with 

f(Aj) = Xj/(xj H h x k ) for 1 < j < k. 

This allows the probability in ([55]) to be evaluated as 

-A- (£xj + xj+i H h x k ) 

AA (xj + x j+1 + ... + Xk )£ + j-i) 

This is evidently the probability that perm r (xi, . . . ,Xk) generates the identity permuta- 
tion, and the conclusion follows. □ 



The r-biased arrangement cannot be defined for infinite positive summable sequence 
(xi,x 2 , ■ ■ ■), since the l k = oo' instance of (!5T|) is not a proper distribution for r^O. But 
the right-hand side of (15^1) is well-defined as arrangement of xi, x 2 , ■ ■ ■ in some total order, 
hence the composition <^ o perm is the natural extension of the r-biased arrangement to 
infinite series. 

Proof of TheoremU]^ We represent a finite or infinite positive sequence (xj) whose sum is 
1 as an open subset of [0, 1] composed of contiguous intervals of sizes Xj. The space of open 
subsets of [0, 1] is endowed with the Hausdorff distance on the complementary compact 
sets. This topology is weaker than the product topology on positive series summable to 
1. The limits below are understood as n — > oo. 

We know by a version of Kingman's correspondence [20] that (\B n j\/n,j > 1) — ► (Pj) 
a.s. in the product topology. This readily implies <^(\B n j\/n, j > 1) — > <t(Pj) a.s. in 
the Hausdorff topology, by looking at the M first terms for M such that these terms sum 
to at least 1 — e with probability at least 1 — e, then sending e — > and M — > oo. In 
[5] we showed that perm T (\B n j\, j > 1) — > U a.s. in the Hausdorff topology. (Here the 
definition of the perm r is coupled with (\B n j\,j > 1) by putting these blocks in the order 
determined by uniform sampling from U). The missing link is provided by Lemma IT71 
from which we obtain 

perm T (|B ni |, j > 1) = <^\B nj \,j > 1), 

with the r-biased permutation perm r applied to the finite sequence of positive block-sizes 
(\B n j\,j > 1). Putting things together we conclude that <^(Pj,j > 1) = U. □ 

In three special cases, already identified in the previous work [6], the arrangement of 
PD(a, 6) (or GEM(a,#)) frequencies in a multiplicatively regenerative set has a simpler 
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description: in the (0, 6) case the frequencies are placed in the size-biased order; in the 
(a, a) case the frequencies are 'uniformly randomly shuffled'; and in the (a, 0) case a 
size-biased pick is placed contiguously to 1, while the other frequencies are 'uniformly 
randomly shuffled'. The latter is an infinite analogue of the co-size biased arrangement 
perm : . 

We refer to [21 [25] for further recent developments related to ordered (a, 9) partitions 
and their regenerative properties. 
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